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Introduction 



The Global Initiative for Chronic 
Obstructive Lung Disease (GOLD) 
defines chronic obstructive pulmonary 
disease (COPD) as a disease state 
associated with airflow obstruction 
that is not fully reversible (http:// 
www.goldcopd.org/). COPD is cur- 
rently the fourth leading cause of death 
and the World Health Organization 
reports a likely increase in importance 

to the third leading cause by 2030. According to the World 
Health Organization, COPD is the most common serious chronic 
disease worldwide affecting about 64 million people (The global 
burden of disease: 2004 update, published in 2008). Hence, 
COPD represents a large and increasing burden to the health 
care system. Unfortunately, we have limited disease-modifying 
therapy for COPD and hence, an improved understanding of 
pathogenetic mechanisms leading to novel therapeutic inter- 
ventions and preventive strategies is greatly needed. Under- 
standing the genetic predisposition to COPD is essential to 
develop personalized treatment regimens (Shapiro, 2011). 
This Review aims to highlight the advances in the discovery 
of genetic variants in association with COPD by discussing the 
available approaches and current findings. 

Chronic obstructive pulmonary disease is a multi-factorial 
disorder caused by environmental determinants - most 
commonly cigarette smoking - and genetic risk factors 
(Decramer et al, 2012). In addition to cigarette smoking, COPD 
can also be caused by other environmental factors, particularly 



Since the discovery of alpha-1 antitrypsin in the early 1960s, several new genes have 
been suggested to play a role in chronic obstructive pulmonary disease (COPD) 
pathogenesis. Yet, in spite of those advances, much about the genetic basis of COPD 
still remains to be discovered. Unbiased approaches, such as genome-wide associ- 
ation (CWA) studies, are critical to identify genes and pathways and to verify 
suggested genetic variants. Indeed, most of our current understanding about COPD 
candidate genes originates from CWA studies. Experiments in form of cross-study 
replications and advanced meta-analyses have propelled the field towards unravel- 
ling details about COPD's pathogenesis. Here, we review the discovery of genetic 
variants in association with COPD phenotypes by discussing the available 
approaches and current findings. Limitations of current studies are considered 
and future directions provided. 
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indoor biomass smoke exposure in developing countries 
(Kennedy & Chambers, 2007) . COPD is diagnosed by spirometry 
showing an irreversible decrease in forced expiratory volume 
in 1 s (FEVJ and the ratio of FEV X to forced vital capacity 
(FEVj/FVC). Although there is a dose-response relationship 
between FEVj and the amount of smoke exposure, the FEV X 
decline for smokers with similar exposure varies considerably 
(Burrows et al, 1977; Fletcher, 1976). This suggests that, in 
addition to cigarette smoking (and potentially other environ- 
mental factors), COPD is also influenced by genetic risk factors 
(Fig 1). For over 45 years, we have known that genetic variants 
in the alpha-1 antitrypsin (AAT) gene serpin peptidase inhibitor, 
clade A, member 1 {SERPINA1) lead to COPD. However, AAT 
deficiency accounts for only 1-2% of all COPD cases. Thus, 
other variants in the genome are likely to be associated with 
COPD traits. Finally, it will be important to unravel how 
environment and genes interact as part of COPD's pathogenesis. 
As with other chronic inflammatory diseases, it has been shown 
that epigenetic changes (Yao & Rahman, 2012) and somatic 
mutations (Tzortzaki et al, 2012) are involved in the pathogen- 
esis of COPD. 

Like many chronic complex diseases, it has been difficult to 
unravel the genetic predisposition and pathogenetic mechan- 
isms for COPD. This is in part due to the heterogeneous nature of 
the disease. For example, airflow obstruction that defines COPD 
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can result from destruction and enlargement of alveoli [i.e. 
emphysema) with loss of elastic recoil or through obstruction of 
small airways or both (Hogg et al, 2004) . Both of these processes 
occur with smoking but are not mechanistically related. 
Therefore, identifying the genetic basis for either of the traits 
does not justify extrapolation of genetic determinants for other 
phenotypes. Rather different phenotypic traits may be deter- 
mined by complex genetic networks, which may or may not 
overlap. Improved phenotypic measurement of discrete disease 
traits, such as computerized tomography (CT) for emphysema 
and spirometry primarily for small airway disease, will allow 
investigators to more precisely identify genotype-phenotype 
correlations (Kim et al, 2009). 

Genetic approaches 

Family, twin and segregation studies 

Basic genetic approaches included family, twin and segregation 
studies. Early epidemiological studies found that COPD 
aggregates in families (Larson et al, 1970; Higgins et al, 1984; 
Tager and Speizer, 1976) by showing stronger correlations 
between parents and children or siblings than between spouses. 
Twin (Redline et al, 1987; Redline, 1990) and segregation 



studies (Givelber et al, 1998) suggested that the genetic 
susceptibility for COPD is due to many genes with small effects 
(Chen et al, 1996; Givelber et al, 1998). These early discoveries 
initiated the search for novel gene variants with gene- 
association and linkages studies. 

Candidate gene-association studies 

Candidate gene-association studies examine genes that were 
postulated to play a central role in COPD pathogenesis and 
investigate the strength of association between disease traits and 
candidate gene variants. Genetic studies for COPD were 
performed as gene-association studies by focusing primarily 
on genes from the protease-antiprotease and oxidant-antiox- 
idant pathways. However, given the diverse pathways (such as 
inflammation, innate immunity, cell death, matrix repair 
mechanisms and lung development) involved in COPD 
pathogenesis it is likely that other genes contribute as well. 
Also, inconsistencies among those studies restrained our 
advancement towards clarifying the genetic basics of COPD. 
The contradictory findings were mostly driven by limited 
population cohorts, non-standardized disease definitions and 
varying statistical methods (including differences in adjusting 
for race, ethnicity, gender, environment and genetic back- 
ground). A recent meta-analysis of assumed genes showed that 



Glossary 




Candidate gene-association studies 

A candidate gene association study examines the associations between a 
previously specified gene and the phenotype of interest. 

Chronic obstructive pulmonary disease (COPD) 

A progressive lung disease that makes it hard to breath. 

Computerized tomography (CT) 

Medical imaging procedure that utilizes computer-processed X-rays to 
produce tomographic images or 'slices' of specific areas of the body. 

Family, twin, and segregation studies 

Family and twin studies are association studies that aim to avoid potential 
confounding factors of population stratifications by using family members 
for control and cases. Segregation studies determine if a major gene is 
associated with a phenotype of interest. 

Forced expiratory volume in one second (FEV a ) 

The volume of air that can forcibly be blown out in 1 s, after full inspiration. 

Genetic variants 

Variations of genomes between members of species or between groups of 
species. Includes SNP (in case it is a common genetic variant), mutation 
(in case it is a rare genetic variant) and copy-number variation. 

Cenome-wide association (CWA) studies 

Examination of many common genetic variants in different individuals to 
investigate if any variant is associated with a certain trait. 

Linkage disequilibrium 

The occurrence in a population of two linked alleles at a frequency higher 
or lower than expected on the basis of the gene frequencies of the 
individual genes. 

Linkage study 

The formal study of the association between the inheritance of a condition 
in a family and a particular chromosomal locus. 



Meta-analysis 

Method focused on contrasting and combining results from different 
studies, in the hope of identifying patterns among study results, sources of 
disagreement among those results or other interesting relationships that 
may come to light in the context of multiple studies. 

Next-generation sequencing 

High-throughput sequencing; technology that technologies that paralle- 
lizes the sequencing process, producing thousands or millions of sequences 
at once. 

Pack year 

A way to measure the amount a person has smoked over a long period of 
time. Calculated by multiplying the number of packs of cigarettes smoked 
per day by the number of years the person has smoked. 

Pathogenesis 

The mechanism by which the disease is caused. 
Polymorphic marker 

A length of DNA that displays population-based variability so that its 
inheritance can be followed. 

Single nucleotide polymorphism (SNP) 

DNA sequence variation occurring when a single nucleotide in the genome 
differs between members of a biological species or paired chromosomes in 
an individual. 

Spirometry 

Measuring of breath; the most common of pulmonary function tests, 
measuring lung function, specifically the amount (volume) and/or speed 
(flow) of air that can be inhaled and exhaled. 

Whole-exome sequencing 

Technique to selectively sequence the coding regions of the genome. 
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Figure 1. COPD is caused by chronic environmental insults (in particular cigarette smoking) in individuals with predispositions due to variations in one or 
multiple genes. The combination of environment and genes lead to distinct aberrant pathophysiological processes/pathways, the combination of which causes 
COPD. 



many of the gene variants tested in gene-association studies are 
indeed not successfully associated with COPD (Smolonska et al, 
2009) . Nevertheless, in spite of the overall disappointing results, 
a few studies appear promising - namely for MMP12 - and will 
be discussed in detail below (Hersh et al, 2011; Hunninghake 
et al, 2009). 

Linkage studies 

As opposed to candidate gene-association studies where genes 
are chosen, linkage studies represents an unbiased approach 
and are not limited by an incomplete understanding of disease 
pathogenesis. Polymorphic markers that are spread across 
the entire genome are examined for their association with the 
phenotype of interest. Yet, due to the low marker density, the 
identified loci are often large in size and can contain several 
hundreds of genes that need to be sorted through to find those 
that are associated with the disease. Fine-mapping procedures 
can eventually narrow the regions to more defined locations and 
potentially identify novel genes (DeMeo et al, 2006; Wilk et al, 
2003). However, linkage studies lack the statistical power 
needed to identify genetic loci with small genetic effects that are 
commonly associated with complex diseases, such as COPD 
(Risch & Merikangas, 1996). Since the recent availability of 
high-density single nucleotide polymorphism (SNP) panels for 
whole-genome association studies, linkage studies have largely 
been abandoned. 



Cenome-wide association studies 

Genome-wide association (GWA) studies provide an unbiased 
and hypothesis-free approach to identify genome variations 
associated with disease phenotypes (Soler Artigas, 2012). We 
have come a long way since the first COPD GWA study and have 
not only identified novel candidate genes but also improved the 
methods along the way to ensure the most accurate results. Due 
to the use of dense SNP maps (generally hundreds of thousands 
of SNPs), the search for novel genes can be pinpointed more 
accurately than with linkage analysis. However, GWAS studies 
also have limitations due to the small sample sizes (the genome 
variation underlying lung function are believed to have modest 
effects; therefore, very large populations are required to identify 
them) and lack of large-scale follow up studies, which increases 
the risk for identification of false-positive associations. Also, 
SNP panels often do not represent disease-associated genetic 
variants per se but may rather be in linkage disequilibrium (LD) 
with them. A potential strategy to resolve these issues has been 
proposed recently at an international COPD genetics conference, 
where it was suggested that a COPD Genetics Consortium be 
formed to promote collaborations between investigators of 
existing COPD populations (Silverman et al, 2011). A similar 
approach has been initiated with the SpiraMeta Consortium 
combining multiple GWA studies on subjects with European 
ancestry in large-scale meta-analysis (Obeidat et al, 2011). 
These Consortia provide an approach for empowering GWA 
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studies and accelerating the identification of common genome 
variations associated with COPD. 

In the very near future, we are going to be able to utilize 
whole-genome information obtained by next-generation 
sequencing that will not only improve our abilities to identify 
common variants but also help teasing out the role of rare and 
structural genomic variations. However, there are many 
challenges that must be overcome before whole-genome 
sequencing becomes routine. For Freeman-Sheldon syndrome 
2 and Miller syndrome, it has already been demonstrated 
successfully that whole-exome sequencing can identify the 
underlying disease gene (Biesecker, 2010; Ng et al, 2010). 
Whole-exome sequencing was also applied successfully for the 
identification of DNMT3A mutations in acute myeloid leukae- 
mia (Ley et al, 2010). While whole-exome sequencing has the 
advantage of cost and coverage, rapid cost reductions of whole- 
genome sequencing will likely render whole-exome sequencing 
less useful since it only covers 1-2% of the genome - albeit an 
important 1-2%. 

In summary, although progress in resolving the genetic basis 
of COPD has been slow since the discovery of AAT in the early 
1960's, recent techniques have greatly improved and advances 
in defining COPD genes have accelerated and will continue to do 
so. To date, there are currently accepted and recently suggested 
COPD genes that will be discussed in this review below (Table 1 
and Fig 1). 

Accepted COPD genes 

Alpha- 1 antitrypsin, encoded by the SERPINA1 gene, is a 
member of the serpine protease inhibitor superfamily (SERPIN) . 
AAT is mainly produced in the liver and is the major physiologic 
inhibitor of the serine protease neutrophil elastase (NE; Stoller 
& Aboussouan, 2011). In addition to NE, AAT inhibits other 
serine proteinases including proteinase 3 (PR3) (Esnault et al, 
1993) and cathepsin G (Topic et al, 2009). AAT also inhibits 
kallikreins (Felber et al, 2006), matriptase (Janciauskiene et al, 
2008), caspase-3 (Miller et al, 2007) and ADAM- 17 (Bergin et al, 
2010). 

Alpha-1 antitrypsin deficiency was first described in 1964 
in two patients with severe respiratory insufficiency due to 
emphysema (Eriksson, 1964). The identification of the AAT 
variant was possible due to the development of the Pi system, in 
which AAT mutants migrate distinctly in an electric field from 
the normal M form. The most common variant, the Z isoform, is 
due to the single amino acid substitution from glutamic acid to 
lysine (i.e. Glu342Lys), which causes a perturbation in the 
protein structure resulting in its defective secretion from 
hepatocytes (Kass et al, 2012). This remarkable story not only 
shows how a clinical diagnosis can successfully be linked to the 
genetic basis for a COPD phenotype, it also highlights the long 
time span required in the past to go from clinical observation 
(1963) to identification of the amino acid substitution (1978) 
with limited tools. Fortunately, technical advances in unravel- 
ling the pathogenetic basis of diseases greatly accelerate the 
processes involved in gene finding today. However, at present, 



the Z variant of AAT remains the only truly accepted genome 
variant associated with COPD. 

Suggested COPD genes 

Early COPD CWA studies: interleukin 6 receptor (IL6R) and 
glutathione S-transferase (GST02) 

Wilk and colleagues reported a GWA study for lung function 
measures in 2007 (Wilk et al, 2007). The authors collected 
several spirometry parameters from 1220 related individuals 
that participated in the Framingham Heart Study (FHS) and 
performed association studies using 70,987 SNPs from the 
Affymetrix 100K SNP GeneChip. The location of the strongest 
associations differed depending on the physiological phenotype. 
Percent predicted forced expiratory flow from the 25th to 
75th percentile (FEF25-75) was slightly associated with a SNP in 
the IL6R region on lq21 (rs4129267; p-value = 0.07), whereas 
FEVi and FVC were most significantly associated with the 
GST02 region on 10q25.1 (rsl56697; p-value = 9.42 x 10~ 5 ). 
Although the findings of this study were groundbreaking at the 
time, there were shortcomings. In particular, it is important 
to notice that both associations did not reach genome-wide 
significance. While the non-synonymous SNP of GST02 reached 
a p-value of 10~ 5 , the SNP at the IL6R locus only reached a 
p-value of 0.07. Most likely, these shortcomings were at least 
partially due to the low-density genome coverage with < 100,000 
SNPs, which may have given rise to potentially ill-defined 
associations. 

IL6R, the receptor of interleukin 6 (IL6), is involved in 
both pro- and anti-inflammatory processes. IL6R exists as 
soluble form and forms a complex with IL6. The IL6/IL6R 
complex appears to play a role in cigarette smoke-induced 
inflammation, recruiting inflammatory cells to the lung to 
eliminate foreign particles such as cigarette smoke components, 
only to have a myriad of other effects on lung tissue. Finally, 
IL6 (the IL6R ligand) has been shown to be associated with 
lung function in the Framingham offspring population (Walter 
et al, 2008). 

GST02, a family member of the glutathione S-transferases, 
which are proteins involved in metabolizing xenobiotics 
and carcinogens, has been postulated to play role in COPD 
related to its involvement in arsenic biotransformation as 
arsenic is a chemical element of cigarette smoke (Mukherjee 
et al, 2006). 

Hedgehog-interacting protein (HHIP) 

Two years after the first COPD GWA publication, the 
investigators published again on findings from the FHS 
population addressing some of the issues discussed in their 
initial study (Wilk et al, 2009). Foremost, the SNP panel was 
more than five-times larger with 550,000 SNPs. Also, the 
number of subjects was increased from 1220 to 7691. Another 
advantage of this investigation was that significant SNPs were 
also tested in a second unrelated population - the Family Heart 
Study cohort. This time, the investigators examined FEVVFVC 
to characterize patients. Four linked SNPs on chromosome 
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Table 1. Overview of COPD genes and details of their study of origin 



Year 


Gene 


Chr 


Band 


Approach 


Phenotype 


#SNPs 


Population 


Primary 
population(s) 


Replication 
population 


Potential Function 
of Variants 


Reference 


1964 


AAT 


14 


q32.13 


Pi system 
(electrophoresis) 


Respiratory 
insufficiency 


NA 


2 patients 




NA 


Protease inhibition 


Eriksson 
(1964) 


2007 


\L6R 


1 


q21.3 


CWA study 


FEF25-75 


70,987 


1220 (fb) 


FHS 


NA 


Immune mechanisms 


Wilk et al 
(2007) 


2007 


CST02 


10 


q25.1 




FEV1, FVC 








NA 


Arsenic 

biotransformation 




2009 


HHIP 


4 


q31.21 


CWA study 


FEV1/FVC 


550,000 


7691 (fb) 


FHS 


Family Heart Study, 
CHARGE Consortium, 
SpiroMeta 
Consortium 


Lung development 
by hedgehog 
signaing 


Wilk et al 
(2009) 


2009 


IREB2 


15 


q25.1 


CWA study 


FEV1/FVC 


561,466 


823 (810) 


Bergen cohort 


ICGN, NETT/NAS, 
BEOCOPD 


Pulmonary iron 
homeostasis 


Pillai et al 
(2009) 


2009 


MMP12 


11 


q22.3 


Gene-association 
study 


FEV1 


SNPs in 
linkage 
with 
MMP12 


8300 


Genetics of Asthma 
in Costa Rica Study, 
CAMP, Children, 
Allergy, Milieu, 




Elastase activity 


Hunninghake 
et al (2009) 



Stockholm, 
Epidemiological 
Survey, BEOCOPD, 
NETT, Lovelace 
Smokers Cohort, NAS 



2010 


FAM13A 


4 


q22.1 


GWA study 


FEV1/FVC 


550,000 


2940 (1380) 


Bergen cohort, 
NETT/NAS, ECLIPSE 


COPDGene, 
ICGN, BEOCOPD, 
CHARGE Consortium 


Oxidative stress and 
impaired apoptosis 


Cho et al 
(2010) 


2010 


CSTCD 

TNS1 
ACER 
HTR4 
THSD4 


4 

2 
6 
5 

15 


q24 

q35 
p21.3 
q32 
q23 


GWA study 


FEV1 

FEV1 
FEV1/FVC 
FEV1 
FEV1/FVC 


2,705,257 


20,288 


12 GWA studies 
(european origin) 


CHARGE Consortium 

CHARGE Consortium 
CHARGE Consortium 


Developmental and 
remodeling pathways 


Repapi et al 
(2010) 


2011 


BICD1 


12 


pll.21 


GWA study 


CT scan 


550,000 


2380 


ECLIPSE, NETT/NAS, 
Bergen cohort 




Telomere shortening 


Kong et al 
(2011) 


2011 


SOX5 


12 


pl2.1 


GWA study/ 

Gene-association 

study 


FEV1, 
FEV1/FVC 


1387 


386 (424) 


NETT/NAS 


BEOCOPD 


Development lung 
morphogenesis 


Hersh et al 
(2011) 


2011 


MFAP2 


1 




Meta-analysis 
GWA 


FEV1, 
FEV1/FVC 


~2,500,000 


48,201 


SpiroMeta 
Consortium, 


CARDIA, CROATIA- 
Split, LifeLines, 


antigen of elastin- 
associated microfibrils 


Soler Artigas 
et al (2011) 



CHARGE 
Consortium 



LBC1936, MESA-Lung, 
RS-III, TwinsUK-ll 



Epithelial repair 
process, extracellular 
collagen accumulation 
regulation of gene 
expression 
premature alveolar 
septation 



MECOM 


3 


SPATA9 


5 


ARMC2 


6 


NCR3 


6 


ZKSCAN3 


6 


CDC123 


10 


ClOorfll 


10 


LRP1 


12 


CCDC38 


12 


MMP15 


16 


CFDP1 


16 


KCNE2 


21 


RAB4B 


19 



Response to cell stress 



Ion transport in airway 
epithelial cells 



Bergen cohort, 
COPDGene 



Cho et al 
(2012) 



EGLN2 

MIA 

CYP2A6 



Nicotine dependence 
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(Chr) 4 at about 145 Mb (i.e. 4q31) were identified to be 
significant on a genome-wide level. One of those four SNPs 
(rsl3147758) was genotyped in the Family Heart Study, but in 
this replication study, it did not reach genome-wide significance. 
However, other studies found SNP associations on 4q31 
(Hancock et al, 2010; Repapi et al, 2010; Zhou et al, 2012), 
thus strengthening evidence that this locus harbours a novel 
COPD gene. The SNPs on Chr 4 were found to be located 
in an intergenic region just downstream of the 5' start site of 
HHIP, hence representing a potential role in the regulation 
of HHIP expression. Alternatively, these SNPs could also 
be in linkage with the disease-causing variant. Together, these 
findings suggest compelling evidence that this candidate 
locus may truly influence airflow obstruction in COPD 
patients. HHIP, a hedgehog-interacting protein, is involved in 
hedgehog signalling and has been shown to be involved in 
lung development (Shi et al, 2009). The process of lung 
development is relevant to COPD because abnormal lung 
development could lead to impaired reserve predisposing to 
COPD in smokers. Also, it has been shown that other lung 
growth and remodelling genes such as WNT are re-activated 
(Tzortzaki et al, 2012), which indicates that abnormal 
remodelling and repair mechanisms are important molecular 
processes involved in COPD. 

a-Nicotinic acetylcholine receptor (CHRNA 3/5) locus and 
iron-responsive element binding protein (IREB2) 

At the same time the HHIP candidate locus was published, Pillai 
et al published a GWA study on the identification of the CHRNA 
3/5 locus at 15q25.1 (Pillai et al, 2009). Here, the primary study 
population was a case-control cohort from Bergen, Norway, 
with 823 COPD patients and 810 control subjects. The top 100 
associations were further investigated in three other cohorts: 
the International COPD Genetics Network (ICGN; cases and 
controls), the US National Emphysema Treatment Trail (NETT; 
COPD cases) and the Normative Aging Study (NAS; controls), as 
well as the Boston Early-Onset COPD (BEOCOPD) cohort. 
Similar to the HHIP publication, the phenotypes investigated 
here were FEVj/FVC and post-bronchodilator FEVx (only in 
the BEOCOPD). Two SNPs on Chr 15 at the CHRNA 3/5 locus 
(rs8034191 and rsl051730) reached genome-wide significance 
and were replicated successfully in the independent study 
cohorts. This Chr 15 locus was previously studied in association 
with nicotine dependence and, thus represented a promising 
candidate region (Berrettini, 2008; Saccone et al, 2007; 
Siedlinski et al, 2011). Interestingly, the SNP associations were 
significant with and without adjustment for smoking exposure 
in the original Norway cohort and a significant SNP by pack- 
years interaction was observed in the ICGN replication 
population. These observations inferred that the differences 
between COPD patients and controls were more likely due to 
genetic determinants of smoking behaviour (i.e. nicotine 
addiction) rather than genetic determinants of COPD per se. 
The latter is supported in light of the observations of significant 
associations between the CHRNA 3/5 locus and smoking 
behaviour in lung cancer (Spitz et al, 2008; Thorgeirsson et al, 
2008). However, another study on lung cancer did not show 



that this locus is associated with smoking behaviour (Cantrell 
et al, 2008). Therefore, further investigation is required 
to characterize the effects of the Chr 15 locus in regards 
to smoking behaviour, lung cancer or both. An integrative 
genomics approach (i.e. combined gene expression and genetic 
association studies) independently identified variants in IREB2 
that are in tight LD with the CHRNA 3/5 variants, suggesting 
IREB2 as a likely COPD candidate gene at the CHRNA 3/5 locus 
(DeMeo et al, 2009). IREB2 belongs to the iron regulatory 
protein family (IRPs) that maintains iron homeostasis by 
regulating iron uptake and distribution. IREB1 and IREB2 
maintain the cellular iron metabolism (Rouault, 2006) . Regional 
differences in iron and IRPs exist in smokers (Nelson et al, 
1996), which can potentially lead to variation in oxidative stress 
in the lung - a mechanism of importance in emphysema and 
lung cancer. 

Family with sequence similarity 13, member Al (FAM13A) 

The independent populations, in which the CHRNA3-CHRNA5- 
IREB2 and HHIP loci were identified, were combined and 
resulted in the identification of the FAM13A locus (Cho et al, 
2010). Together, the investigators used 2940 COPD cases and 
1380 controls (i.e. current and former smokers) from three 
populations: (i) the case-control population from Norway; (ii) a 
cohort consisting of NETT cases and NAS controls; and (iii) a 
case and control population from the multi-centre Evaluation 
of COPD Longitudinally to Identify Predicted Surrogate 
Endpoints (ECLIPSE). The two most significantly associated 
SNPs (rs7671167 and rsl903003; ^ = 0.85) were found at 
4q22.1 within a FAM13A intron, which is located just down- 
stream of the Rho-GTPase-activating protein (Rho-GAP) 
domain. To verify their findings, the investigators genotyped 
the most significant SNPs using the COPDGene Study popula- 
tion. SNP associations for the top two SNPs were also tested in 
the ICGN and BEOCOPD populations. Associations of the SNP 
rs7671167 were significant in COPDGene and ICGN and had a 
tendency toward significance in the BEOCOPD. Furthermore, an 
independent GWA investigation of lung function using the 
populations form the Cohorts for Heart and Aging Research 
in Genomic Epidemiology (CHARGE) consortium reported 
an association of FAM13A with FEVj/FVC (Hancock et al, 
2010). Evidence for a possible role of FAM13A in COPD is 
its differential expression during hypoxia in cell cultures of 
epithelial and endothelial cells (Chi et al, 2006) and during 
epithelial cell differentiation of alveolar type II cells (Wade et al, 
2006). FAM13A expression differences have also been observed 
among mild and severe cystic fibrosis patients (Wright et al, 
2006). The significant SNP associations were not associated 
with pack-years of cigarette smoking and, thus, FAM13A is most 
likely mediating the genetics of lung function or potentially 
COPD as opposed to smoking behaviour. A recent report also 
shows the independent association of the FAM13A locus with 
lung cancer (Young & Hopkins, 2011). 

FAM13A - a Rho-GAP domain containing gene (Cohen et al, 
2004) - exhibits tumour suppressor activity by inhibiting the 
signal transduction molecule Rho A (Ridley, 2001). In COPD Rho 
A activity has been shown to be involved in oxidative stress 
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and impaired clearance of apoptotic cells (Richens et al, 2009). 
Similar to HMGCoA reductase inhibitors (statins), Rho-GAP 
seems to modulate the HMGCoA reductase enzyme, and there- 
fore, provides an explanation why statins may have the potential 
to protect against COPD and lung cancer (Young et al, 2009). 

Five additional loci associated with FEVi and FEVj/FVC 

A meta-analysis of several GWA studies by the SpiraMeta 
Consortium identified five additional loci associated with 
FEVj and FEVj/FVC (Repapi et al, 2010): Tensin 1 (TNS1); 
glutathione S-transferase, C-terminal domain containing 
(GSTCD); advanced glycosylation end product-specific receptor 
(AGER); 5-hydroxytryptamine (serotonin) receptor 4 (HTR4); 
and thrombospondin, type I, domain containing 4 (THSD4) . 

As a result of combining multiple GWA studies, the 
investigators were able to include 20,288 individuals with 
European ancestry and 54,276 individuals in follow-up 
investigations. The power of the analysis was greatly increased 
due to increased quantity of genotype and phenotype data, 
which ultimately led to the identification of highly significant 
SNP association (p-values ranged from 10~ 9 to 10~ 23 ). 
Significant loci were detected for FEVj at 4q24 (GSTCD), 
2q35 (TNS1) and 5q33 (HTR4), and for FEV 1( /FVC at 6p21 
(AGER) and 15q23 (THSD4). Another locus at 6p21 within the 
borders of dishevelled associated activator of morphogenesis 2 
(DAAM2) contained a suggestive association with FEV1/FVC. 
GSTCD, HTR4 and AGER were identified independently in the 
GWA study by the CHARGE Consortium (Hancock et al, 2010). 
Both, the SpiroMeta and CHARGE Consortia, also found 
associations at the HHIP locus (see above). The associations 
identified in this study did not change when adjusted for 
qualitative or quantitative smoking exposure and so the 
underlying genes most likely are not involved in smoking 
addiction. Nevertheless, a previous report showed a role for 
TSHD4 in smoking cessation (Uhl et al, 2008). Proposed 
mechanisms that may underlie these newly identified genes are 
either developmental pathways or tissue remodelling pathways 
that are important for airway architecture and lung repair. 

SRY (sex determining region Y)-box 5 (SOX5) 

Linkage studies in the family-based BEOCOPD cohort identified 
a locus on Chr 12 but the gene of interest could not be isolated 
at this point (Silverman et al, 2002a,b). Thus, a systematic 
approach to fine-map the region on Chr 12 was applied by 
genotyping 1387 SNPs in 386 COPD cases from the NETT cohort 
and 424 healthy smokers from the NAS cohort (Hersh et al, 
2011). Significant associations were located in an intergenic 
and gene-dense region making the identification of a true 
candidate gene difficult. Significant SNPs were tried to replicate 
in the BEOCOPD and ICGN cohorts. The most significant SNP 
in the BEOCOPD population (rsll046966) was found to be 
located in close proximity (7 kb downstream) to the 3' end of 
SOX5. Further evidence for SOX5 to be a COPD candidate 
gene are as follows. COPD subjects showed reduced SOX5 
gene expression and abnormal embryonic lung development 
as well as decreased expression of the extracellular matrix 
molecule fibronectin were found in Sox5 _/_ mice. Even though 



the replication of the SNP was not convincing in one of the 
replication populations (ICGN) , the analysis in the mouse model 
suggests a role of SOX5 in developmental lung morphogenesis, 
which, as discussed, could decrease lung functional reserve 
in the adult. 

Bicaudal D homolog 1 (BICD1) 

The investigations that led to the identification of BICD1 were 
the first to use chest CT scans allowing for specific character- 
ization of the emphysma phenotype (Kong et al, 2011). Up to 
this point, COPD patients were characterized using spirometry, 
which is a measure of airflow and not directly related to a 
single COPD phenotype. Chest CT scans assess lung density, 
which is proportional to lung airspace enlargement that 
defines emphysema. Quantitative analysis and radiologist- 
based qualitative score of CT images were investigated in this 
GWA study using three different COPD cohorts (i.e. ECLIPSE, 
NETT/NAS, Bergen cohort from Norway). Interestingly, there 
was only a slight overlap between the quantitative and the 
qualitative phenotyping methods. The most significant intronic 
variation on 12pll.21 (rsl0844154) was associated with the 
qualitative assessment by the radiologist but not with the 
quantitative method. This variation is located close to exon 2 of 
BICD1. BICD1, a homolog of the Drosophila gene bicaudal-D 
(BicD), is involved in regulation of dynein function. Exon 2 
harbours the binding region for dynein, a molecule involved 
in mitosis, mRNA transport and dentritic and axonal vesicle 
transport (Baens &Marynen, 1997). Previously, BICD1 had also 
been linked to shortening of telomere length (Mangino et al, 
2008), supporting recent theories that link COPD to aging 
(Shapiro, 2011). Telomere shortening triggers cellular senes- 
cence, especially in epithelial stem cells. Hence, short telomeres 
can lead to inability to maintain epithelial integrity leading to 
emphysema (Alder et al, 2011). 

Sixteen novel genome loci for lung functions 

A large-scale meta-analysis in combination with follow-up 
investigations identified 16 novel genome loci for lung functions 
(Soler Artigas et al, 2011): Microfibrillar-associated protein 2 
[MFAP2), Transforming growth factor, beta 2 [TGFB2-LYPLAL1), 
Histone deacatylase 4 (HDAC4FLJ43879), Retinoic acid receptor 
(RARE), Ecotropic virus integration site 1 [MECOM (EVXZ)], 
Spermatogenesis associated 9 (SPATA9-RHOBTB3), Armadillo 
repeat containing 2 (ARMC2), Natureal cytotoxicity triggering 
receptor 3 {NCR3-AIF1), Zinc finger with KRAB and SCAN 
domains 3 (ZKSCAN3), Cell division cycle 123 homolog 
[CDC123), Chromosome 100 open reading frame 11 
[ClOorfll), Low density lipoprotein receptor-related protein 1 
(LRPJ), Coiled-coil domain containing 38 {CCDC38), Matrix 
metallopeptidase 15 [MMP15], Craniofacial development pro- 
tein 1 (CFDP1) and Potassium voltage-gated channel subfamily 
E member 2 [KCNE2-LINC00310 {C21orf82}}. 

The authors evaluated 2.5 million SNPs from 23 individual 
investigations (17 from the SpiroMeta consortium and 6 from 
the CHARGE consortium) for FEVi and FEVi/FVC in 48,201 
individuals of European origin. The association testing, which 
was stratified for smoking status (ever vs. never smoking), 
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revealed 29 new loci that were associated with lung function at a 
p-value of less than 3 x 1CT 6 . Those loci were followed-up in 
another 17 studies using in silico and newly genotyped data. A 
second meta-analysis across the original and follow-up studies 
identified SNP associations with p-values of <S x 1CT 8 in 16 
of the 29 new loci. Those 16 SNPs are located within or in 
close proximation to MFAP2 and TGFB2-LYPLAL1 on Chrl; 
HDAC4FLJ43879 on Chr2; RARB and MECOM [EVI1] on 
Chr3; SPATA9-RHOBTB3 on Chr5; ARMC2, NCR3-AIF1 and 
ZKSCAN3 on Chr6; CDC123 and ClOorfll on ChrlO; LRP1 
and CCDC38 on Chrl2; MMP15 and CFDP1 on Chrl6; and 
KCNE2-LINC00310 [C21orf82) on Chr21. Some of these new 
loci are known to be involved in molecular mechanisms 
that regulate lung functions. For example, MFAP2 is an 
antigen of elastin-associated microfibrils (Gibson et al, 1986) 
and RARB has previously been linked to premature alveolar 
septation (Massaro et al, 2000). CDC123 plays an important 
role in response to cell stress by regulation of eukaryotic 
initiation factor 2 (Bieganowski et al, 2004). HDAC has already 
been recognized in COPD for its regulatory function in gene 
expression (Ito et al, 2005) and TGFB2 is known to modulate 
the epithelial repair processes and extracellular collagen 
accumulation (Thompson et al, 2006). Finally, KCNE2 is 
potentially involved in ion transport of airway epithelial cells 
(Cowley & Linsdell, 2002). 

Member of RAS oncogen family (RAB4B), Egl nine homolog 2 
(ECLN2), melanoma inhibitory activity (MIA), cytochrome 
P450 2A6 (CYP2A6) 

Another large-size meta-analysis GWA study was performed for 
traits such as COPD, pre-bronchodialator FEVx and severe COPD 
diagnosed in 3499 cases compared to 1922 controls (Cho et al, 
2012). The subjects were obtained from the following four 
populations: ECLIPSE, NAS and NETT, the Bergen (Norway) 
cohort and the COPDGene study. Illumina plattforms were 
used for genotyping and missing SNPs were imputed using 
the 1000 Genomes data. This study identified a new locus on 
Chr 19ql3 (rs7937), which reached genome-wide significance 
with a p-value of 10~ 9 . The association of this locus was repeated 
in 2859 subjects of the family-based ICGN cohort, thus 
strengthening the already great evidence for this new locus. 
Genes within this genome region are RAB4B, EGLN2, MIA 
and CYP2A6. While RAB4B, EGLN2 and MIA are of potentially 
interest due to their expression in developing animal and human 
lung (Groenman et al, 2007; Lin et al, 2008; Otulakowski et al, 
2009), CYP2A6 has previously been associated with lung cancer 
and has been shown to be involved in nicotine metabolism 
(Hukkanen et al, 2005; London et al, 1999; Nakajima et al, 
1996), in particular of the major nicotine metabolite cotinine 
(Thorgeirsson et al, 2010). 

Genes identified by gene-association studies 

Early gene-association studies for COPD were often conflicting 
due to a variety of methodological issues (Silverman, 2006), 
particularly small sample size and lack of replication popula- 



tions. However, despite candidate bias, if properly done, these 
types of studies can be powerful. Hunninghake et al (2009) 
performed an association study, in which the investigators 
examined the association between MMP12 variants and the lung 
function phenotype FEVj (Hunninghake et al, 2009). Unlike 
many previous association studies, this investigation was well- 
controlled for age, sex, height and exposure to smoke, and used 
a very large number of patients. More than 8300 subjects 
were studied with >20,000 FEVx measurements performed in 
seven study cohorts [(1). Genetics of Asthma in Costa Rica 
Study; (2) Childhood Asthma Management Program (CAMP); 
(3) Children, Allergy, Milieu, Stockholm, Epidemiological 
Survey; (4) BEOCOPD; (5) NETT; (6) Lovelace Smokers Cohort; 
(7) NAS] . This scenario greatly improved the power to identify 
true disease variants. Indeed, the minor allele (G) of a SNP 
(rs2276109) in the MMP12 promoter region at llq22.3 was 
significantly associated with FEVi in all seven cohorts and, 
particularly, with adult smokers and the risk of COPD in adult 
smokers. 

MMP12 was previously suggested to play a central role in 
COPD due to its elastase activity and the fact that MMP12 null 
mutant mice were entirely protected from cigarette smoke- 
induced emphysema (Hautamaki et al, 1997). The identified 
variant in the MMP12 promoter mediates decreased promoter 
activity by diminishing AP-1 binding, which leads to decreased 
MMP12 expression (Wu et al, 2003). As predicted, less MMP12 
expression protected against COPD. Interestingly, this study 
also suggests that MMP12 is a candidate gene for asthma, 
particularly in smokers. 

Animal models to dissect COPD sub-phenotypes 

Phenotype analysis 

Animal models were fundamental in formulating the elastase/ 
antielastase hypothesis over 45 years ago, which remains the 
cornerstone of COPD pathogenesis. At that time, Gross et al 
(1965) instilled papain into experimental animals resulting in 
airspace enlargement that defines emphysema (Gross et al, 
1965) . Subsequently, a variety of animal models have been used 
to further our understanding of COPD. Models include exposure 
of animals to molecular, chemical and environmental agents 
that lead to airspace enlargement (Shapiro, 2000). In particular, 
elastases (Janoff et al, 1977; Kao et al, 1988; Senior et al, 1977), 
cigarette smoke (Snider et al, 1986; Wright & Churg, 1990), and 
more recently, inducers of apoptosis (Kasahara et al, 2000) have 
been most informative. Over- and under-expression of proteins 
using transgenic, gene-targeted mice and natural mutant mice 
have been extremely useful in exploring the pathogenesis of 
COPD (D'Armiento et al, 1992; Shipley et al, 1996). No single 
animal model recapitulates human COPD in its entirety, but 
several result in features associated with the disease (Hautamaki 
et al, 1997). An advantage of studying COPD as compared to 
many other diseases is that we know what causes it - cigarette 
smoke exposure. Of note however, mouse lung structure is 
not identical to the lung structure in humans. For example, 
mice have few submucosal glands, they have much less airway 
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branching, and do not contain respiratory bronchioles. How- 
ever, upon exposure to cigarette smoke, mice do develop 
important changes similar to humans including inflammation 
with neutrophils, macrophages and T cells followed by airspace 
enlargement that is easily detectable in many, but not all, 
strains at 6 months (Hautamaki et al, 1997). With respect to 
the airways, upon cigarette smoke exposure, mice lose cilia, 
develop goblet cell hypertrophy, and show submucosal fibrosis. 
Importantly, all of these changes are dependent on the 
individual mouse strain. Indeed, phenotypes measured in 
multiple mouse strains can be used in GWA scans (genetic 
mapping studies similar to GWA studies in humans) to identify 
disease-causing genetic variants. 

Murine genome-wide scans 

Using mice in GWA studies can help to accelerate the 
identification of the genetic basis of complex human diseases. 
Identifying the genetic basis responsible for phenotypic 
variations in mouse models is most successful when using 
dense SNP panels and phenotypic measures across several 
laboratory strains. It has been suggested that successful 
genome-wide studies in the mouse require at least 30 different 
strains (Cervino et al, 2007). In recent years, investigators 
have performed large-scale phenotyping studies for several 
disease traits across multiple strains. Currently, high- 
throughput phenotyping efforts are underway to characterize 
pathological changes in the lung in response to acute and 
chronic cigarette smoke exposure. The success of genome-wide 
scans in the mouse depends on the availability and accuracy 
of genotype information. SNP panels are available through 
multiple institutions. For example, several million SNPs 
for close to 100 mouse strains are provided through the 
HapMap SNP project (http://snp.cshl.org/) and the Center 
for Genome Dynamics (http://cgd.jax.org/). Those high-density 
SNP panels obtained complete SNP coverage across the 
examined strains by imputations. Depending on the imputation 
algorithm used to predict missing SNP imputation methods 
vary in their error rates (Wang et al, 2012). An alternative 
source for non-imputed genotype information is available by 
whole-genome sequence data available for 18 strains through 
the Welcome Trust Sanger Institute (Keane et al, 2011; Yalcin 
et al, 2011). 

Finally, the mouse is a good model for applying advanced 
bioinformatic techniques to verify the correctness of a potential 
locus. Identified genes can easily be examined for expression 
differences at the SNP and mRNA level as well as at the protein 
level (i.e. Western blot or immunohistochemistry) . Prediction 
algorithms such as SIFT by the J. Craig Venter Institute (http:// 
sift.jcvi.org/) or PolyPhen2 by the Sunyaev laboratory at 
Harvard (http://genetics.bwh.harvard.edu/pph2/) can help to 
identify functionally important non-synonymous SNPs. Finally, 
verification of newly discovered genes is possible in genetically 
engineered mice (e.g. transgenic and conditional knockout 
mice). Studying the genetic basis of COPD in mice may help to 
tease out molecular pathways that are difficult to unravel due 
to ethical considerations when investigating human cohorts. 
Confirmation of the importance of MMP-12 in humans based on 



mouse studies is one example of the potential to translate 
findings in mice to humans. 

Future directions 

Although we have come a long way since the discovery of AAT, 
much about the genetic basis of COPD remains to be discovered. 
The driving factor for understanding COPD susceptibility is to 
identify true genetic variants. This requires advances in the 
way we perform genome-wide studies with respect to both 
phenotyping and genotyping. To understand obstructive lung 
diseases such as COPD our attention is directed towards 
improved and more discrete phenotyping. Use of electronic 
health records will also allow investigators to link individual 
variation in disease manifestations to underlying genetics. 
Another limiting factor for successful genome-wide studies is 
the accuracy and density of the genotype information. The aim is 
to utilize whole-genome DNA and RNA sequence data so that no 
imputations become necessary and the SNP density is at its 
maximum. As cost continues to decrease, use of whole-genome 
technology is becoming practical for patient populations. 

Once genes are identified, we must then put them into 
molecular pathways or networks and identify the role of these 
pathways in disease pathogenesis. Unbiased approaches are 
critical to identify genes and pathways not yet considered. 
However, many of the discovered genes are not well described 
and teasing out their function and role in COPD is not always 
straightforward. This problem is manifest in this Review, where 
it is not yet possible to place the genes in coherent networks 
that truly inform about the mechanisms of COPD. Once critical 
pathways are identified, investigators can work on means to 
inhibit those pathways leading to disease modifying therapy. 
Understanding the genetics of COPD is also necessary for the 
development of personalized medicine. We look forward to a 
day when genetic information is a routine part of patient care 
informing the physician of one's disease susceptibility, course, 
potential complications, co-morbidities and treatment. This, 
and the elimination of cigarette smoking, will ultimately lower 
the burden of COPD. 
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Pending issues 

I Develop automated CT scan analysis for precise phenotype analysis 
of peripheral lung diseases, such as emphysema. 

I Test and verify identified candidate genes and variants for 
functionality in COPD susceptibility. 

I Utilizing whole-genome and RNA sequencing information for 
association studies to identify novel and verify already found 
genome variants. 
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